Markov chain order estimation with parametric significance tests of conditional mutual information

نویسندگان

  • Maria Papapetrou
  • Dimitris Kugiumtzis
چکیده

Besides the different approaches suggested in the literature, accurate estimation of the order of a Markov chain from a given symbol sequence is an open issue, especially when the order is moderately large. Here, parametric significance tests of conditional mutual information (CMI) of increasing order m, Ic(m), on a symbol sequence are conducted for increasing orders m in order to estimate the true order L of the underlying Markov chain. CMI of order m is the mutual information of two variables in the Markov chain being m time steps apart, conditioning on the intermediate variables of the chain. The null distribution of CMI is approximated with a normal and gamma distribution deriving analytic expressions of their parameters, and a gamma distribution deriving its parameters from the mean and variance of the normal distribution. The accuracy of order estimation is assessed with the three parametric tests, and the parametric tests are compared to the randomization significance test and other known order estimation criteria using Monte Carlo simulations of Markov chains with different order L, length of symbol sequence N and number of symbols K. The parametric test using the gamma distribution (with directly defined parameters) is consistently better than the other two parametric tests and matches well the performance of the randomization test. The tests are applied to genes and intergenic regions of DNA sequences, and the estimated orders are interpreted in view of the results from the simulation study. The application shows the usefulness of the parametric gamma test for long symbol sequences where the randomization test becomes prohibitively slow to compute.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Markov Chain Order estimation with Conditional Mutual Information

We introduce the Conditional Mutual Information (CMI) for the estimation of the Markov chain order. For a Markov chain of K symbols, we define CMI of order m, Ic(m), as the mutual information of two variables in the chain being m time steps apart, conditioning on the intermediate variables of the chain. We find approximate analytic significance limits based on the estimation bias of CMI and dev...

متن کامل

Feature extraction for EEG classification: representing electrode outputs as a Markov stochastic process

In this work we introduce a new model for representing EEG signals and extracting discriminative features. We treat the outputs of each electrode as a stochastic process and assume that the sequence of variables forming a process is stationary and Markov. To capture temporal dependences within an electrode, we use conditional entropy and to capture dependences between different electrodes we us...

متن کامل

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Financial Risk Modeling with Markova Chain

Investors use different approaches to select optimal portfolio. so, Optimal investment choices according to return can be interpreted in different models. The traditional approach to allocate portfolio selection called a mean - variance explains. Another approach is Markov chain. Markov chain is a random process without memory. This means that the conditional probability distribution of the nex...

متن کامل

Exact Test of Independence Using Mutual Information

Using a recently discovered method for producing random symbol sequences with prescribed transition counts, we present an exact null hypothesis significance test (NHST) for mutual information between two random variables, the null hypothesis being that the mutual information is zero (i.e., independence). The exact tests reported in the literature assume that data samples for each variable are s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Simulation Modelling Practice and Theory

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2016